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Abstract— This paper presents an efficient 
implementation of a high speed multiplier using the shift and 
adds method of Baugh-Wooley Multiplier. This parallel 
multiplier uses lesser adders and lesser iterative steps. As a 
result of which they occupy lesser space as compared to the 
serial multiplier. This is very important criteria because in 
the fabrication of chips and high performance system 
requires components which are as small as_ possible. 
Experimental results demonstrate that the proposed circuit 
not only improves the accurate performance but also 
reduces the hardware complexity and also less power 
consumption that is dynamic power of 15.3mW and 
maximum clock period of 3.912ns is required which is very 
efficient as compared to the reference paper. 


Keywords— Baugh-Wooley Multiplier, Pipeline resister, 
Power Efficient, Carry Save Adder. 


I. INTRODUCTION 


Multiplication involves two basic operations— the 
generation of the partial product and their accumulation 
[5]. Therefore, there are possible ways to speed up the 
multiplication that reduces the complexity, and as a result 
reduces the time needed to accumulate the partial 
products. Both solutions can be applied simultaneously. 


Baugh-Wooley Two’s Complement Signed 
Multiplier: Two’s Compliments is the most popular 
method in representing signed integers in Computer 
Sciences. It is also an operation of negation (Converting 
positive to negative numbers or vice —versa) in computers 
which represent negative numbers using two’s 
complements. Its use is so wide today because it does not 
require the addition and subtraction circuitry to examine 
the signs of the operands to determine whether to add or 
subtract. Two’s complement and one’s complement 


- Positive Negative Integers 


Integers Sign & 2s I's 
a Magnitude _| Complement | Complement 
0000 1000 as 111] 
0001 100] Lill 1110 
0010 1010 1110 1101 
0011 101] 1101 1100 
0100 1100 1190 1011 
0191 L101 1011 1010 
0110 1110 1010 1001 
O11 L111 1001 1000 
---- ---- 1000 ---- 


FIGI(a): Two’s complement & one’s complement representation 
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representations are commonly used since arithmetic units 
are simpler to design. Fig.1 shows Two’s & One’s 
complement representation. 


Baugh-Wooley Two’s complement Signed 
numbers: Baugh-Wooley Two’s complement Signed 
multipliers is the best known algorithm for signed 
multiplication because it maximizes the regularity of the 
multiplier and allow all the partial products to have 
positive sign bits [3]. Baugh—-Wooley technique was 
developed to design direct multipliers for Two’s 
complement numbers [9]. When multiplying Two’s 
complement numbers directly, each of the partial products 
to be added is a signed number. Thus each partial product 
has to be sign extended to the width of the final product in 
order to form a correct sum by the Carry Save Adder 
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FIGI (b): 5*5 unsigned multiplications 
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FIG] (c): 5*5 signed multiplications 
(CSA) tree. According to Baugh-Wooley approach, an 
efficient method of adding extra entries to the bit matrix 
suggested to avoid having deal with the negatively 
weighted bits in the partial product matrix. In fig] (a) & (b) 
partial product arrays of 5*5 bits Unsigned and Signed bits 
are shown. 


Figure | (c) shows how this algorithm works in the 
case of a 5x5 multiplication. The first three rows are 


referredtoas PM (partial products with magnitude part) 
and generated by one NAND and three AND operations. 
The fourth row is called as PS (partial products with sign 
bit) and generated by one AND and three NAND 
operations with a sign (with magnitude part) and 
generated by one NAND and three AND operations. 
Consider the partial products of PM. Suppose b2= b0in 
figure (c). Then the third row can be obtained by shifting 
the first row by 2 bits. Likewise, shift operation can be 
used to obtain a partial product of different bit level as in 
sign magnitude multiplication. 


Baugh-Wooley schemes become an area consuming 
when operands are greater than or equal to 32 bits. The rest 
of the paper is organised as follows. The Baugh-Wooley 
architecture is explained in section 2. Implementation 
results in terms of power, area, and speed 4 bit multipliers 
and comparison are presented. 


Il. BAUGH-WOOLEY ARCHITECTURE 


Hardware architecture for Baugh- Wooley multiplier is 
shown in fig 2. It follows left shift algorithm. Through 
MUX (multiplexer) we can select which bit will multiply. 


Suppose we are adding +5 and -5 in decimal we get ‘0’. 
Now, represent these numbers in 2’s complement form, 
and then we get +5 as 0101 and -5 as 1011. On adding 
these two numbers we get 10000. Discard carry, then the 
number is represented as ‘0’. 
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Fig 2(a): Signed 2’s-Complement Baugh-Wooley Hardware Multiplier 


Baugh-Wooley Multiplier [5]: 


Baugh-Wooley Multiplier is used for both unsigned 
and signed number multiplication. Signed Number 
operands which are represented in 2’s complemented 
form. Partial Products are adjusted such that negative sign 
move to last step, which in turn maximize the regularity of 
the multiplication array. Baugh-Wooley Multiplier 
operates on signed operands with 2’s complement 
representation to make sure that the signs of all partial 
products are positive. 


Here are using fewer steps and also lesser adders. Here 
a0, al, a2, a3& b0, b1, b2, b3 are the inputs. I am getting 
the outputs that are pO, p1... p7. As 1am using pipelining 
resister in this architecture, so it will take less time to 
multiply large number of 2’s complement but less than 
32 bit. Above 32 bit Modified Baugh-Wooley Multiplier 
is used. 


Fig 2(b): Block diagram of a 4*4 Baugh-Wooley multiplier 


III. IMPLEMENTATION RESULTS AND COMPARISONS 


In this paper, I use 4-bit pipelined multipliers and are 
implemented in VHDL and logic simulation is done by 
using ModelSim Designer and the synthesis is done using 
Xilinx ISE 8.2i of Device 4VFX20FF672-12. The 
synthesis result of 4-bit pipelined Baugh-Wooley 
architecture is shown in table above of the reference paper 
and my paper. 


In my paper, I am using Brent-Kung adder (BK 
adder), an advanced design prefix adder, which is a very 
good balance between area and power cost and also it will 
present better performance. This adder has a complex 
carry and inverse carry tree. A tree can be divided into 2 
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types that is a tree and an inverse tree. The upper tree based 
on periodic power of 2. The inverse tree is offset 1, 
beginning from the bottom of the matrix and expanding 
outwards at power of 2. 


The results show that the Baugh-Wooley multiplier 


has increased speed since clock period is only15.86Ins. 
Pipeline stages further improve the Baugh - Wooley 
architecture speed. Number of LUTs represents the area 
required for implementation. 


The number of LUTs required in Baugh-Wooley 
architecture is 30 compared to 32. The fan-out of the 
multiplier architecture is also given which directly gives 
the possibility of the multiplier to form large circuits. This 
can be extended to the pipelined multiplier architecture 
also to verify the parameters. Latency and speed are the 
Important factors with pipelining under consideration. 
The synthesis results of 4-bit pipelined multipliers are 
shown in Table 2. Power consumption in Baugh-Wooley 
multipliers is minimum compared to other conventional 
multiplier units. So it clears that the signed binary 
multiplication through Baugh-Wooley multiplication is 
suited for large multiplier implementation. The 
improvements in constraint can be used to make Baugh- 
Wooley multiplier more efficient. The fan-out of the 
multiplier architectures are also given which directly 
gives the possibility of the multiplier to form large 
circuits. This can be extended to the pipelined multiplier 
architecture also to verify the parameters. Latency and 
speed are the important factors with pipelining under 
consideration. The synthesis results of 4-bit pipelined 
multipliers are shown in Table 2. The pipeline constraint 


Fig 3: Synthesis Report of Baugh-Wooley Architecture 


Output: Simulation Result for Baugh- Wooley Architecture: 
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Fig 4: Simulation Result for Both Unsigned Numbers Multiplication 


TABLE | 
SYNTHESIS RESULTS OF DIFFERENT 4-BIT PIPELINED 
MULTIPLIER ARCHITECTURES 


| Multiplier — Number of | Fanout | Clock- | Power 
| Architecture) LUTS | Period | Dissipation 
| | (ns) | (mW) 
- — _ + OO SS —_—EE — —-+ _ _— — 
| Add-and- 74 18 | §.939 | 68.89 
| Shift | | | 
| Baugh- | 32 ‘| 104 ~=| 15.029] 67.84. 
a 
| Wooley | 
a ee a 
TABLE 2 
MY SYNTHESIS RESULTS OF DIFFERENT 4-BIT 
PIPELINED MULTIPLIER ARCHITECTURES 
Multiplier Numberof | Fanout | Clock- | Power 


Period | Dissipation 
(ms) | (mW) | 


Architecture | LUTS 


| 
~ 


= ——+ = —— © — 
| 


Add-and- 74 18 | 8.939 | 68.89 
Shift | | 
Baugh- 30 46 5.921 | 13.3 

| Wooley | 


$$ —_____ —__— —EE—e —E—EE 


a —— —__. 


increases the speed of the multiplier considerably with a 
increase in power consumption. For the Baugh-Wooley 
multipliers, the clock period reduces to 3.32 1ns as a result 
of pipeline registers implemented. This improves the 
speed which may reduce due to the BK adder which I used 
in my architecture. The maximum delay for this 
architecture is 2.143ns. I am using 65 Flip Flops out of 
17088 and maximum frequency is 527.037MHz, which is 
a good sign. The incorporation of the pipeline multipliers 
thus can be effectively done to make the chip efficiently 
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Fig. 5 Simulation Result For both Signed Number Multiplication 


— 


reconfigurable among the two reconfiguration modes and 
this work is in progress. The possibility of other 
reconfiguration constraints is under work and the 
implementation of the reconfiguration modes according 
to these constraints are the future work. 


IV. CONCLUSIONS 


An efficient multiplier to deal with the latency 
oroblem is proposed. This paper presents a comparison 
between various multiplier architectures with area, speed 
and power as the main constraints. It is observed that the 
Baugh-Wooley architecture gives optimized values for 
warious constraints and hence suited for long bit 
multiplication up to less than 32 bit. The pipelining 
resister techniques are used to improve the multiplier 
characteristics. 
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